Marginality: A Numerical Mapping for Enhanced Exploitation of Taxonomic Attributes

نویسنده

  • Josep Domingo-Ferrer
چکیده

Hierarchical attributes appear in taxonomic or ontologybased data (e.g. NACE economic activities, ICD-classified diseases, animal/plant species, etc.). Such taxonomic data are often exploited as if they were flat nominal data without hierarchy, which implies losing substantial information and analytical power. We introduce marginality, a numerical mapping for taxonomic data that allows using on those data many of the algorithms and analytical techniques designed for numerical data. We show how to compute descriptive statistics like the mean, the variance and the covariance on marginality-mapped data. Also, we define a mathematical distance between records including hierarchical attributes that is based on marginality-based variances. Such a distance paves the way to re-using on taxonomic data clustering and anonymization techniques designed for numerical data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anonymization Methods for Taxonomic Microdata

Often microdata sets contain attributes which are neither numerical nor ordinal, but take nominal values from a taxonomy, ontology or classification (e.g. diagnosis in a medical data set about patients, economic activity in an economic data set, etc.). Such data sets must be anonymized if transferred outside the data collector’s premises (e.g. hospital or national statistical office), say, for ...

متن کامل

Marginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes

The purpose of statistical disclosure control (SDC) of microdata, a.k.a. data anonymization or privacy-preserving data mining, is to publish data sets containing the answers of individual respondents in such a way that the respondents corresponding to the released records cannot be re-identified and the released data are analytically useful. SDC methods are either based on masking the original ...

متن کامل

Anonymization of nominal data based on semantic marginality

Nominal attributes are very common in data sets about individuals, specifically medical data like patient healthcare records. Attributes of this type tend to be sensitive due to their personal nature. If public-use data sets need to be released, e.g. for clinical research purposes, data should be first anonymized. However, since most anonymization methods omit data semantics when dealing with n...

متن کامل

Elite Opposition-based Artificial Bee Colony Algorithm for Global Optimization

 Numerous problems in engineering and science can be converted into optimization problems. Artificial bee colony (ABC) algorithm is a newly developed stochastic optimization algorithm and has been widely used in many areas. However, due to the stochastic characteristics of its solution search equation, the traditional ABC algorithm often suffers from poor exploitation. Aiming at this weakness o...

متن کامل

NUMERICAL TAXONOMIC STUDY OF THE IRANIAN SPECIES OF ALYSSUM L. BASED ON MORPHOLOGICAL CHARACTERS

The genus Alyssum L. belongs to the subtribe Alyssinae, tribe Alysseae and family Cruciferae (Brassicaceae). This genus is one of the largest genera of the family of Cruciferae in Iran, and seems to be the most problematic genus in which the boundary of certain species is not completely clear due to the polymorphism of morphological characters. The main objective of this research is to stud...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012